-
Notifications
You must be signed in to change notification settings - Fork 40
Improve whitespace handling in Lexer & Parser, warn on bad whitespace around binops #1249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
| /** | ||
| * Negative lookahead | ||
| */ | ||
| def lookbehind(offset: Int): Token = | ||
| tokens(position - offset) | ||
| def sawNewlineLast: Boolean = { | ||
| @tailrec | ||
| def go(position: Int): Boolean = | ||
| if position < 0 then fail("Unexpected start of file") | ||
|
|
||
| tokens(position).failOnErrorToken(position) match { | ||
| case token if isSpace(token.kind) && token.kind != Newline => go(position - 1) | ||
| case token if token.kind == Newline => true | ||
| case _ => false | ||
| } | ||
|
|
||
| go(position - 1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we did lookbehind(1) == Newline, but that no longer works, the lookbehind would need to ignore all whitespace tokens except for a newline... I found that to be too weird, so I replaced it with a specialised function to just check if the last non-space token was a newline (feel free to suggest a better name)
| // Check that the current token is surrounded by whitespace. If not, soft fail. | ||
| private def checkBinaryOpWhitespace(): Unit = { | ||
| // position points to the operator token in the raw token array | ||
| val wsBefore = position > 0 && isSpace(tokens(position - 1).kind) | ||
| val wsAfter = position + 1 < tokens.length && isSpace(tokens(position + 1).kind) | ||
|
|
||
| if (!wsBefore || !wsAfter) { | ||
| softFail(s"Missing whitespace around binary operator", position, position) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm very happy about the implementation :)
dvdvgt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overally, this looks good to me. I am in favor deferring whitespace checks around the operators to the parser instead of the lexer. Just some minor comments below.
| else | ||
| Token(tokenStartPosition.offset, position.offset - 1, kind) | ||
|
|
||
| private def skipWhitespace(): Unit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called skipNewline instead maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it outright, it's not needed anymore, I think.
| case (' ', _) => advanceSpaces() | ||
| case ('\t', _) => advanceSpaces() | ||
| case ('\n', _) => advanceWith(TokenKind.Newline) | ||
| case ('\r', '\n') => advance2With(TokenKind.Newline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused now when newlines and spaces are skipped and when they are emitted. next calls skipWhitespace but also not always. Perhaps a brief comment on whitespace handling would be nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I redid this part, whitespace ought to be always emitted. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, seems reasonable
marvinborner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are people (and formatters) that like to omit the spaces to make the precedence of operators more clear to the reader (see PEP8 "use your own judgement" 😉). Especially with more complex precedence rules (like for parsing or chaining (e.g. arrow for computation UFCS)), we might want to relax the enforcing of whitespace.
Just want to leave this comment here, I'm fine with this for now though :)
|
I have changed it to only emit a warning :) |
|
cc @marvinborner: do your feelings change now that it's merely a warning, and one that doesn't crash, allowing you to even run a program with parser warnings? :) [see tests] |
| // Check that the current token is surrounded by whitespace. If not, soft fail. | ||
| private def checkBinaryOpWhitespace(): Unit = { | ||
| val wsBefore = position > 0 && isSpace(tokens(position - 1).kind) | ||
| val wsAfter = position + 1 < tokens.length && isSpace(tokens(position + 1).kind) | ||
|
|
||
| if (!wsBefore || !wsAfter) { | ||
| warn(s"Missing whitespace around binary operator", position, position) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could ever-so-slightly generalise this:
// Require that the current token is surrounded by whitespace. If not, soft fail.
private def requireSpacesAround(): Unit = {
val wsBefore = position > 0 && isSpace(tokens(position - 1).kind)
val wsAfter = position + 1 < tokens.length && isSpace(tokens(position + 1).kind)
if (!wsBefore || !wsAfter) {
warn(s"Missing whitespace around '${TokenKind.explain(peek.kind)}'", position, position)
}
}and then use it for other places where we really want spaces on both sides (like the = in val foo = ... / def foo = ... / ...) :)
|
Fine by me, sure. In the future we could maybe add flags to hide warnings, including such parser warnings, if desired. |
Resolves #462
The parser soft fails (fails with recovery) if it encounters a binary operator that is not surrounded by some whitespace:
The implementation is a bit hacky, see the note in the comment.